Acquisition of Subcategorization Frames from Large Scale Texts
نویسندگان
چکیده
Subcategorization frames are useful for many applications. Due to many ambiguities, to extract them is not straightforward. In this paper, a probabilistic chunker is used to determine the plausible phrase boundaries and a finite state mechanism, SUBCAT-TRACTOR, is proposed to extract 23 subcategorization frames. In order to get rid of the problems introduced by compound nouns, a noun-phrase extractor is applied. In addition, two extra rules are presented to capture the movement phenomena. Abstakt. Unterkategorierungs-Rahmen sind vielseitig verwendbar. Diese Rahmen kö nnen jedoch nicht direkt ausgewählt werden. In diesem Aufsatz geht es um einen "Probabilistic Chunker", mit dem man sinnvolle Satz-Grenzen bestimmen kann und um einen "Subcat-Tractor", mit dem 23 Unterkategorierungs-Rahmen ausgewählt werden können. Um die probleme zu lösen, die derch zusammengesetzte Nomen entstehen, wird einen "Noun-Phrase Extractor" verwendet. Zusätzlich werden zwei Regeln, bezüglich der Bewegungs-Erscheinung vorgestellt.
منابع مشابه
A System for Large-Scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora
This paper describes the first system for large-scale acquisition of subcategorization frames (SCFs) from English corpus data which can be used to acquire comprehensive lexicons for verbs, nouns and adjectives. The system incorporates an extensive rulebased classifier which identifies 168 verbal, 37 adjectival and 31 nominal frames from grammatical relations (GRs) output by a robust parser. The...
متن کاملA Corpus-based Conceptual Clustering Method for Verb Frames and Ontology Acquisition
We describe in this paper the ML system, ASIUM, which learns subcategorization frames of verbs and ontologies from syntactic parsing of technical texts in natural language. The restrictions of selection in the subcategorization frames are filled by the concepts of the ontology. Applications requiring subcategorization frames and ontologies are crucial and numerous. The most direct applications ...
متن کاملLexical acquisition from corpora: the case of subcategorization frames in French
We present in this paper a method to automatically acquire a syntactic lexicon of subcategorization frames for French verbs directly from large corpora. The method is evaluated against existing lexical resources: we show that our system is capable of producing new frames that were not previously registered. Lastly, we show that it is possible to induce lexico-semantic classes « à la Levin » (19...
متن کاملAutomatic Extraction and Evaluation of Arabic LFG Resources
This paper presents the results of an approach to automatically acquire large-scale, probabilistic Lexical-Functional Grammar (LFG) resources for Arabic from the Penn Arabic Treebank (ATB). Our starting point is the earlier, work of (Tounsi et al., 2009) on automatic LFG f(eature)-structure annotation for Arabic using the ATB. They exploit tree configuration, POS categories, functional tags, lo...
متن کاملA Subcategorization Frames Acquisition System for French Verbs
This paper presents a system intended to automatically acquire subcategorization frames (SCFs) of verbs from the analysis of large corpora. The system has been applied to a newspaper corpus (made of 10 years of the French newspaper Le Monde) and acquired subcategorization information for 3267 verbs. 286 SCFs were dynamically learnt for these verbs. From the analysis of 25 representative verbs, ...
متن کامل